{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Notebook Description\n", "\n", "Often we want to compare if two files with seizure annotations contain the same annotations. For example, if you look through a week of recordings and annotate the sezures, comparing a classifier's predictions with your annotations will allow you to check the number of false positives and (more importantly) false negatives. " ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For an example we will look at the difference between a raw predictions output and the same file after it has been manually checked and the false postives removed using the gui. " ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "checked_preds = pd.read_csv('./example_data/checked_predictions.csv', index_col=0)\n", "raw_preds = pd.read_csv('./example_data/raw_predictions.csv', index_col=0)\n", "\n", "checked_preds = pd.read_csv('/media/jonathan/My Passport-ELE -EEG/2019/analysis pyecog/01b_baseline_annotation_only.csv',\n", " index_col=0)\n", "raw_preds = pd.read_csv('/media/jonathan/My Passport-ELE -EEG/2019/analysis pyecog/prediction from Tawfeeq Library/predictions_baselineconverted01.csv',\n", " index_col=0)\n", "# \n", "path= r'nathan\\My Passport-ELE -EEG\\2019/analysis pyecog/prediction from Ta'\n", "raw_preds = pd.read_csv(path, index_col=0)\n" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5\n", "10\n" ] }, { "data": { "text/plain": [ "10" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = 5\n", "print(x)\n", "x = 10\n", "print(x)\n", "x" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# note to jonny 2019 :\n", "\n", "The file output by the clf does not havethe [] areound the transmitter. Whereas " ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
old_indexfilenamestartenddurationtransmitterreal_startreal_end
00M1566408624_2019-08-21-18-30-24_tids_[142].h5125.0285.0160.0[142]2019-08-21 18:32:292019-08-21 18:35:09
11M1566459024_2019-08-22-08-30-24_tids_[119].h51965.02025.060.0[119]2019-08-22 09:03:092019-08-22 09:04:09
22M1566459024_2019-08-22-08-30-24_tids_[119].h52125.02195.070.0[119]2019-08-22 09:05:492019-08-22 09:06:59
33M1566459024_2019-08-22-08-30-24_tids_[119].h52290.02305.015.0[119]2019-08-22 09:08:342019-08-22 09:08:49
44M1566459024_2019-08-22-08-30-24_tids_[119].h52380.02425.045.0[119]2019-08-22 09:10:042019-08-22 09:10:49
\n", "
" ], "text/plain": [ " old_index filename start end \\\n", "0 0 M1566408624_2019-08-21-18-30-24_tids_[142].h5 125.0 285.0 \n", "1 1 M1566459024_2019-08-22-08-30-24_tids_[119].h5 1965.0 2025.0 \n", "2 2 M1566459024_2019-08-22-08-30-24_tids_[119].h5 2125.0 2195.0 \n", "3 3 M1566459024_2019-08-22-08-30-24_tids_[119].h5 2290.0 2305.0 \n", "4 4 M1566459024_2019-08-22-08-30-24_tids_[119].h5 2380.0 2425.0 \n", "\n", " duration transmitter real_start real_end \n", "0 160.0 [142] 2019-08-21 18:32:29 2019-08-21 18:35:09 \n", "1 60.0 [119] 2019-08-22 09:03:09 2019-08-22 09:04:09 \n", "2 70.0 [119] 2019-08-22 09:05:49 2019-08-22 09:06:59 \n", "3 15.0 [119] 2019-08-22 09:08:34 2019-08-22 09:08:49 \n", "4 45.0 [119] 2019-08-22 09:10:04 2019-08-22 09:10:49 " ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_preds.head()" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(112, 8)" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "raw_preds.shape" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(77, 8)" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "checked_preds.shape" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
old_indexfilenamestartenddurationtransmitterreal_startreal_end
2414M1567629024_2019-09-04-21-30-24_tids_[28, 33, ...860.18939.9879.80[119]04/09/2019 21:4404/09/2019 21:46
4407M1567611024_2019-09-04-16-30-24_tids_[28, 33, ...2976.583031.5654.98[119]04/09/2019 17:2004/09/2019 17:20
7380M1567524624_2019-09-03-16-30-24_tids_[28, 33, ...26.2668.7942.53[119]03/09/2019 16:3003/09/2019 16:31
9377M1567521024_2019-09-03-15-30-24_tids_[28, 33, ...3298.903347.5648.66[119]03/09/2019 16:2503/09/2019 16:26
10376M1567521024_2019-09-03-15-30-24_tids_[28, 33, ...2996.263057.7361.47[119]03/09/2019 16:2003/09/2019 16:21
\n", "
" ], "text/plain": [ " old_index filename start \\\n", "2 414 M1567629024_2019-09-04-21-30-24_tids_[28, 33, ... 860.18 \n", "4 407 M1567611024_2019-09-04-16-30-24_tids_[28, 33, ... 2976.58 \n", "7 380 M1567524624_2019-09-03-16-30-24_tids_[28, 33, ... 26.26 \n", "9 377 M1567521024_2019-09-03-15-30-24_tids_[28, 33, ... 3298.90 \n", "10 376 M1567521024_2019-09-03-15-30-24_tids_[28, 33, ... 2996.26 \n", "\n", " end duration transmitter real_start real_end \n", "2 939.98 79.80 [119] 04/09/2019 21:44 04/09/2019 21:46 \n", "4 3031.56 54.98 [119] 04/09/2019 17:20 04/09/2019 17:20 \n", "7 68.79 42.53 [119] 03/09/2019 16:30 03/09/2019 16:31 \n", "9 3347.56 48.66 [119] 03/09/2019 16:25 03/09/2019 16:26 \n", "10 3057.73 61.47 [119] 03/09/2019 16:20 03/09/2019 16:21 " ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "checked_preds.head()" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "So we expect there to be 35 false positives\n" ] } ], "source": [ "print('So we expect there to be', raw_preds.shape[0]-checked_preds.shape[0], 'false positives')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Code for comparing the dataframes" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "def add_mcode_tid_col(df):\n", " '''Note this expects file start to be of format: M1513966209'''\n", " df['mcode_tid'] = df.filename.str.slice(0,11)+'_'+df.transmitter.astype(str)\n", " return df\n", "\n", "def check_overlap(series1,series2):\n", " ''' pandas series should both have start and end columns\n", " http://baodad.blogspot.co.uk/2014/06/date-range-overlap.html\n", " '''\n", " start_a, end_a = float(series1.start), float(series1.end)\n", " start_b, end_b = float(series2.start), float(series2.end)\n", " overlap_bool = (start_a <= end_b) and (end_a>=start_b)\n", " return overlap_bool\n", "\n", "def calculate_overlap(series1,series2):\n", " ''' pandas series should both have start and end attrs\n", " http://baodad.blogspot.co.uk/2014/06/date-range-overlap.html\n", " '''\n", " a, b = float(series1.start), float(series1.end)\n", " c, d = float(series2.start), float(series2.end)\n", " overlap = min([b-a,b-c,d-c,d-a])\n", " return overlap\n", "\n", "def compare_dfs(prediction_df, annotation_df):\n", " ''' \n", " Function to check how much of prediction_df is found in annotation_df\n", " \n", " Returns two dataframes:\n", " - preds_in_annotations_df: the predictions found within the annotations and the amount of overlap\n", " (probably corresponding to true positives). Has an overlap column (seconds).\n", " - preds_not_in_annotations_df: the predictions not found in the annotations\n", " (probably corresponding to false positives)\n", " \n", " Note:\n", " To check for false negatives, or missed seizures, pass in the actual annotations\n", " as the 'prediction_df' and the actual predictions as the 'annotation_df'. \n", " Here we would hope for no 'false positives', so the second dataframe will contain the \n", " missed seizures...\n", " '''\n", " \n", " # first add a 'mcode_tid' col: allows us to check for same hour and transmitter\n", " prediction_df = add_mcode_tid_col(prediction_df)\n", " annotation_df = add_mcode_tid_col(annotation_df)\n", " \n", " # Create empty dataframes that we will add to below\n", " preds_in_annotations_df = pd.DataFrame(columns = prediction_df.columns)\n", " preds_not_in_annotations_df = pd.DataFrame(columns = prediction_df.columns)\n", " \n", " # loop over the predictions\n", " for _, prediction_row_series in prediction_df.iterrows():\n", " overlap_bool = False # boolean for if the predicted seizure at all overlaps with an annotation\n", " # first check if the hour&transmitter is in the annotations\n", " if prediction_row_series.mcode_tid in annotation_df.mcode_tid.unique():\n", " \n", " # next find all annotations with same hour and tid as the prediction row\n", " # this will often just be one row, but if >1 seizures in a single hour will be more\n", " revevant_annotations_df = annotation_df[annotation_df.mcode_tid.isin([prediction_row_series.mcode_tid])]\n", " \n", " # finally check if the start and end columns overlap\n", " t_overlap = 0 # store the overlap time between preds and seizures\n", " for _, annotation_row_series in revevant_annotations_df.iterrows(): \n", " row_overlap = check_overlap(prediction_row_series,\n", " annotation_row_series) # in the case that two seizures, want to add...\n", " overlap_bool += row_overlap\n", " if row_overlap: # is this robust to two seizures?\n", " t_overlap += calculate_overlap(prediction_row_series,\n", " annotation_row_series)\n", " \n", " if overlap_bool>0:\n", " prediction_row_series['overlap'] = t_overlap\n", " preds_in_annotations_df = preds_in_annotations_df.append(prediction_row_series)\n", " else:\n", " preds_not_in_annotations_df = preds_not_in_annotations_df.append(prediction_row_series)\n", " \n", " return preds_in_annotations_df, preds_not_in_annotations_df\n", "\n", "true_positives, false_positives = compare_dfs(raw_preds,checked_preds) " ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "((67, 10), (45, 9))" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "true_positives.shape, false_positives.shape" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
old_indexfilenamestartenddurationtransmitterreal_startreal_endmcode_tidoverlap
11.0M1566459024_2019-08-22-08-30-24_tids_[119].h51965.02025.060.0[119]2019-08-22 09:03:092019-08-22 09:04:09M1566459024_[119]58.43
22.0M1566459024_2019-08-22-08-30-24_tids_[119].h52125.02195.070.0[119]2019-08-22 09:05:492019-08-22 09:06:59M1566459024_[119]39.32
33.0M1566459024_2019-08-22-08-30-24_tids_[119].h52290.02305.015.0[119]2019-08-22 09:08:342019-08-22 09:08:49M1566459024_[119]15.00
44.0M1566459024_2019-08-22-08-30-24_tids_[119].h52380.02425.045.0[119]2019-08-22 09:10:042019-08-22 09:10:49M1566459024_[119]43.08
55.0M1566459024_2019-08-22-08-30-24_tids_[119].h52585.02635.050.0[119]2019-08-22 09:13:292019-08-22 09:14:19M1566459024_[119]50.00
\n", "
" ], "text/plain": [ " old_index filename start end \\\n", "1 1.0 M1566459024_2019-08-22-08-30-24_tids_[119].h5 1965.0 2025.0 \n", "2 2.0 M1566459024_2019-08-22-08-30-24_tids_[119].h5 2125.0 2195.0 \n", "3 3.0 M1566459024_2019-08-22-08-30-24_tids_[119].h5 2290.0 2305.0 \n", "4 4.0 M1566459024_2019-08-22-08-30-24_tids_[119].h5 2380.0 2425.0 \n", "5 5.0 M1566459024_2019-08-22-08-30-24_tids_[119].h5 2585.0 2635.0 \n", "\n", " duration transmitter real_start real_end \\\n", "1 60.0 [119] 2019-08-22 09:03:09 2019-08-22 09:04:09 \n", "2 70.0 [119] 2019-08-22 09:05:49 2019-08-22 09:06:59 \n", "3 15.0 [119] 2019-08-22 09:08:34 2019-08-22 09:08:49 \n", "4 45.0 [119] 2019-08-22 09:10:04 2019-08-22 09:10:49 \n", "5 50.0 [119] 2019-08-22 09:13:29 2019-08-22 09:14:19 \n", "\n", " mcode_tid overlap \n", "1 M1566459024_[119] 58.43 \n", "2 M1566459024_[119] 39.32 \n", "3 M1566459024_[119] 15.00 \n", "4 M1566459024_[119] 43.08 \n", "5 M1566459024_[119] 50.00 " ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "true_positives.head()" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
old_indexfilenamestartenddurationtransmitterreal_startreal_endmcode_tid
00.0M1566408624_2019-08-21-18-30-24_tids_[142].h5125.0285.0160.0[142]2019-08-21 18:32:292019-08-21 18:35:09M1566408624_[142]
2121.0M1566765024_2019-08-25-21-30-24_tids_[119].h5875.0950.075.0[119]2019-08-25 21:44:592019-08-25 21:46:14M1566765024_[119]
2323.0M1566822624_2019-08-26-13-30-24_tids_[119].h52545.02625.080.0[119]2019-08-26 14:12:492019-08-26 14:14:09M1566822624_[119]
2424.0M1566822624_2019-08-26-13-30-24_tids_[119].h52715.02760.045.0[119]2019-08-26 14:15:392019-08-26 14:16:24M1566822624_[119]
2525.0M1566822624_2019-08-26-13-30-24_tids_[119].h52870.02910.040.0[119]2019-08-26 14:18:142019-08-26 14:18:54M1566822624_[119]
\n", "
" ], "text/plain": [ " old_index filename start end \\\n", "0 0.0 M1566408624_2019-08-21-18-30-24_tids_[142].h5 125.0 285.0 \n", "21 21.0 M1566765024_2019-08-25-21-30-24_tids_[119].h5 875.0 950.0 \n", "23 23.0 M1566822624_2019-08-26-13-30-24_tids_[119].h5 2545.0 2625.0 \n", "24 24.0 M1566822624_2019-08-26-13-30-24_tids_[119].h5 2715.0 2760.0 \n", "25 25.0 M1566822624_2019-08-26-13-30-24_tids_[119].h5 2870.0 2910.0 \n", "\n", " duration transmitter real_start real_end \\\n", "0 160.0 [142] 2019-08-21 18:32:29 2019-08-21 18:35:09 \n", "21 75.0 [119] 2019-08-25 21:44:59 2019-08-25 21:46:14 \n", "23 80.0 [119] 2019-08-26 14:12:49 2019-08-26 14:14:09 \n", "24 45.0 [119] 2019-08-26 14:15:39 2019-08-26 14:16:24 \n", "25 40.0 [119] 2019-08-26 14:18:14 2019-08-26 14:18:54 \n", "\n", " mcode_tid \n", "0 M1566408624_[142] \n", "21 M1566765024_[119] \n", "23 M1566822624_[119] \n", "24 M1566822624_[119] \n", "25 M1566822624_[119] " ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "false_positives.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Here save the false positives to check through using the gui\n", "\n", "- they might not all be false positives!" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "savename = 'predictions_not_in_annotations.csv'\n", "false_positives.to_csv(savename,header=False, index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Here flip the order of dataframes:\n", "\n", "```\n", "Note:\n", " To check for false negatives, or missed seizures, pass in the actual annotations\n", " as the 'prediction_df' and the actual predictions as the 'annotation_df'. \n", " Here we would hope for no 'false positives', so the second dataframe will contain the \n", " missed seizures...\n", "```\n", "\n", "Pass in checked preds as the predctions. This is similar to the case where you are checking for missed seizures" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [], "source": [ "true_positives, false_positives = compare_dfs(checked_preds,raw_preds) " ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(69, 10)" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "true_positives.shape" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [], "source": [ "false_positives.shape\n", "savename = 'annotations_not_in_predictions.csv'\n", "false_positives.to_csv(savename,header=True, index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "as expected no false positives. If there were these would be annoations that had been missed (if the predictions were over the same time period as the annotations)" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(8, 9)" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "false_positives.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.6" } }, "nbformat": 4, "nbformat_minor": 2 }